Self  Equivalence  of  the  Alternating  Direction  Method  of  Multipliers 


Ming  Yan*  Wotao  Yin* 

August  11,  2014 


Abstract 

In  this  paper,  we  show  interesting  self  equivalence  results  for  the  alternating  direction  method  of 
multipliers  (ADM  or  ADMM).  Specifically,  we  show  that  ADM  on  a  primal  problem  is  equivalent  to 
ADM  on  its  Lagrange  dual  problem;  ADM  is  equivalent  to  a  primal-dual  algorithm  applied  to  a  saddle- 
point  formulation  of  the  problem;  when  one  of  the  two  objective  functions  is  quadratic  with  an  affine 
domain,  we  can  swap  the  update  order  of  the  two  variables  in  ADM  and  obtain  an  equivalent  algorithm. 

An  example  in  extended  monotropic  programming  is  given  to  demonstrate  that  the  primal-dual  algorithm 
may  be  preferable  over  the  other  equivalent  algorithms  for  its  lower  per-iteration  complexity  and,  in  the 
setting  of  distributed  computation,  better  load  balancing. 

Keywords:  Alternating  Direction  Method  of  Multipliers  (ADM/ADMM),  Douglas-Rachford  Splitting 
(DRS),  Primal-Dual  Algorithm,  Extended  Monotropic  Programming,  Total  Variation 


1  Introduction 


The  alternating  direction  method  of  multipliers  (ADM  or  ADMM)  is  very  effective  at  solving  complicated 
convex  optimization  problems.  It  applies  to  linearly-constrained  convex  optimization  problems  with  separable 
objective  functions  in  the  following  form: 


minimize 

x.y 

subject  to 


/(x)  +  <?(y) 
Ax  +  By  =  b, 


(PI) 


where  f,g  are  proper,  closed,  convex  functions  (may  not  be  differentiable)  and  A,B  are  linear  mappings. 
ADM  has  been  applied  to  both  the  primal  and  dual  problems  in  many  applications.  For  example,  it  was 
applied  to  both  the  primal  and  dual  problems  for  t\  minimization  in  [20,  19].  As  another  example,  ADM 
was  applied  to  the  dual  problem  in  [12]  and  to  the  corresponding  primal  problem  in  [11]. 

In  this  paper,  we  show  the  following  equivalence  results  for  ADM: 

1.  It  is  equivalent  to  apply  ADM  to  either  the  original  form  or  the  Lagrange  dual  of  (PI). 

2.  ADM  on  either  (PI)  or  its  dual  is  equivalent  to  a  primal-dual  algorithm  applied  to  a  saddle-point 
formulation  of  (PI);  in  the  latter  algorithm,  since  one  of  the  primal  variables  is  hidden,  each  iteration 
may  have  a  lower  complexity  than  the  other  equivalent  ones,  as  we  shall  demonstrate  by  an  example. 

3.  Whenever  either  /  or  g  is  a  quadratic  (or,  affine  or  linear)  function,  defined  on  either  the  whole  space 
or  an  affine  domain,  swapping  the  order  of  x  and  y  in  ADM  yields  an  equivalent  algorithm. 
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In  all  the  three  cases,  given  the  iterates  of  one  algorithm,  we  can  recover  the  iterates  of  the  equivalent 
algorithm  by  properly  setting  the  initial  iterates  of  the  latter. 


1.1  Notation  and  assumptions 

Let  Hi,  H-2 ,  and  Q  be  (possibly  infinite  dimensional)  Hilbert  spaces.  Bold  lowercase  letters  such  as  x,  y, 
u,  and  v  are  used  for  points  in  the  Hilbert  spaces.  In  the  example  of  (PI),  we  have  x  £  Hi,  y  £  H-2,  and 
b  £  Q.  When  the  Hilbert  space  a  point  belongs  to  is  clear  from  the  context,  we  do  not  specify  it  for  the 
sake  of  simplicity.  The  inner  product  between  points  x  and  y  is  denoted  by  (x,  y),  and  ||x||2  :=  \J (x,  x)  is 
the  corresponding  norm.  ||  •  ||i  and  ||  •  Hoc  denote  the  i\  and  norms,  respectively.  Bold  uppercase  letters 
such  as  A  and  B  are  used  for  both  continuous  linear  mappings  and  matrices.  A*  denotes  the  adjoint  of  A. 
I  denotes  the  identity  map. 

Both  lower  and  upper  case  letters  such  as  /,  g,  F,  and  G  are  used  for  functions,  ic  denotes  the  indicator 
function  of  the  set  C ,  which  is  assumed  to  be  convex  and  nonempty,  lq  is  defined  as  follows: 


tc(x) 


0,  if  x  e  C, 

oo,  if  x  ^  C. 


We  make  the  following  assumption  throughout  the  paper: 


Assumption  1.  Functions  in  this  paper  are  assumed  to  be  proper,  closed,  and  convex.  The  saddle-point 
solutions  to  all  the  optimization  problems  in  this  paper  are  assumed  to  exist. 

Let  <9/(x)  be  the  subdifferential  of  function  /  at  x.  The  proximal  operator  prox^.^  is  defined  as 

Pr°x/(.)(x)  =  arg  min  / (y )  +  hy  -  x|||, 
y  z 

where  the  minimization  has  the  unique  solution.  The  convex  conjugate  /*  of  function  /  is  defined  as 


/*(v)  =sup{(v,x)-/(x)}. 

X 

Let  Vb^°  be  the  projection  onto  the  unit  l ^  “ball”  B^°  :=  {x  :  Hx^  <  1}. 


1.2  Organizations 

This  paper  is  organized  as  follows.  The  three  equivalence  results  for  ADM  are  shown  in  sections  2,  3, and  4: 
The  primal-dual  equivalence  is  discussed  in  sections  2;  ADM  is  shown  to  be  equivalent  to  a  primal-dual 
algorithm  applied  to  the  saddle-point  formulation  in  section  3;  In  section  4,  we  show  that  swapping  the  order 
of  x  and  y  in  ADM  yields  an  equivalent  algorithm  if  /  or  g  satisfies  the  condition  mentioned  above.  We 
conclude  this  paper  with  two  applications  of  our  results:  extended  monotropic  programming  in  section  5  and 
total  variation  image  denoising  in  section  6. 


2  Equivalence  of  ADM  on  primal  and  dual  problems 

In  this  section  we  show  that  ADM  applied  to  (PI)  is  equivalent  to  it  applied  to  the  Lagrange  dual  of  (PI). 
Algorithm  1  describes  how  ADM  is  applied  to  (PI)  [13,  14]. 
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Algorithm  1  ADM  on  (PI) 
initialize  x°,  z],  A  >  0 
for  k  =  0, 1,  •  •  •  do 

yi+1  =  arg min g(y )  +  (2A)"1 1|  AxJ  +  By  -  b  +  Azf  \\\ 
y 

x^+1  =  arg  min /(x)  +  (2A)_1||Ax  +  By^'+1  —  b  +  Az*||| 

X 

z\+l  =  z\  +  A_1(Axi+1  +  Byf+1  -  b) 

end  for 


A  primal  formulation  equivalent  to  (PI)  is 


minimize  F(  s)  +  G(t) 

S,t 

subject  to  s  +  t  =  0, 


where  s,  t  G  Q  and 


(P2) 


F( s)  :=  min / (x)  +  t{x:Ax=s}(x),  (la) 

G{ t)  :=  min g(y)  +  i{y:By-b=t}(y)-  (lb) 

Remark  1.  If  we  define  Lf  and  Lg  as  Lf(x.)  =  Ax  and  Lg( y)  =  By  —  b,  respectively,  then  F  and  G  are 
known  as  the  infimal  postcompositions  of  f  and  g  by  Lf  and  Lf,  respectively,  according  to  [1,  Def.  12.33]. 
They  are  written  as 

F  =  Lf  >  f,  G  =  Lg  >  g. 

Algorithm  2  gives  ADM  applied  to  (P2).  We  will  show  Algorithms  1  and  2  are  (trivially)  equivalent. 


Algorithm  2  ADM  on  (P2) 
initialize  s2,  z2,  A  >  0 
for  k  =  0, 1,  •  •  •  do 

t%+1  =  argminG(t)  +  (2A)_1||s2  +  t  +  Az^Hl 

t 

s^1  =  argminF(s)  +  (2A)_1||s  +  tj"1"1  +  Az^  ||  | 
z^1=z§  +  A-1(s2fc+1+t2fe+1) 

end  for 


The  Lagrange  dual  problem  to  (PI)  is 

minimize  /*(— A*v)  +  g*{— B*v)  +  (v,  b), 

V 

which  can  derived  from  minv  (—  minxy  L(x,  y,  v))  on  the  Lagrangian: 

L(x,  y,  v)  =  /(x)  +  g{ y)  +  (v,  Ax  +  By  -  b). 
An  ADM-ready  formulation  of  (2)  is 

(  minimize  /*(— A*u)  +  <?*(— B*v)  +  (v, b) 

<  U,V 

{  subject  to  u  —  v  =  0. 


(2) 


(Dl) 


When  ADA!  is  applied  to  an  ADM-ready  formulation  of  a  Lagrange  dual  problem,  we  call  it  Dual  ADM. 
The  original  ADM  is  called  Primal  ADM. 
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Following  similar  steps,  the  ADM  ready  formulation  of  the  Lagrange  dual  to  (P2)  is 


(  minimize  F*(— u)  +  G*(— v) 

J  U.V 

[  subject  to  u  v  =  0. 

The  equivalence  between  (Dl)  and  (D2)  is  trivial  since 

F*(u)  =/*(A*u) 

G*(v)  =  g*(B*v)  —  (v,  b), 


(D2) 


which  follows  from  Lemma  1. 

Lemma  1.  If  L  is  affine  and  can  be  expressed  as  L(-)  =  A  •  +b,  the  convex  conjugate  of  L>  f,  the  infimal 
postcomposition  of  f  by  L,  can  be  found  as  follows: 

(£>/)*(■)  =/*(A*-)  +  <-,b>. 

Proof.  Following  from  the  definitions  of  convex  conjugate  and  infimal  postcomposition,  we  have 

(L  >  /)* (v)  =  sup(v,  y)-L>  /( y)  =  sup(v,  Ax  +  b>  -  /(x) 
y  x 

=  sup(A*v,  x)  -  /(x)  +  (v,  b)  =  f*  (A*v)  +  (v,  b). 

X 

□ 


We  apply  ADM  on  (D1)/(D2)  in  Algorithm  3. 


Algorithm  3  ADM  on  (D1)/(D2) 
initialize  u°,  Zg,  A  >  0 
for  k  =  0, 1,  •  •  •  do 

v£+1  =  argmin G*(— v)  +  f  ||u§  -  v  +  A_1z§||| 
u^+1  =  arg min F*(—u)  +  |||u  -  v^+1  +  A_1z§||| 

z*+1=z§  +  A(u*+t-v*+1) 

end  for 


The  following  equivalence  is  shown  in  Theorem  1. 


ADM  on  (PI) 


ADM  on  (P2) 


ADM  on  (D1)/(D2) 


Theorem  1  (Equivalence  of  Algorithms  1-3).  Suppose  AxJ  =  slj  =  z§  and  z®  =  z®  =  u°  and  that  the  same 
parameter  A  is  used  in  Algorithms  1-3.  Then,  their  equivalence  can  be  established  as  follows: 


1.  From  x*;  y(\  z\  of  Algorithm  1,  we  obtain  t s(j,  z§  of  Algorithm  2  through: 


t£  =  Byf-b, 

4  =  Axf, 
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(3a) 

(3b) 

(3c) 


From  t2,  s2,  zf  of  Algorithm  2,  we  obtain  yk,  Xg,  z*  of  Algorithm  1  through: 

y'l  =  arg min{<j[(y)  :  By  -  b  =  t§}, 
y 

Xi  =  arg min{/(x)  :  Ax  =  Sg}, 

X 


,2.  We  can  recover  the  iterates  of  Algorithm  2  and  3  from  each  other  through 


(4a) 

(4b) 

(4c) 


(5) 


Proof.  Part  1.  Proof  by  induction. 

We  argue  that  under  (3b)  and  (3c),  Algorithms  1  and  2  have  essentially  identical  subproblems  in  their  first 
steps  at  the  /cth  iteration.  Consider  the  following  problem,  which  is  obtained  by  plugging  the  definition  of 
G(-)  into  the  t2+1-subproblem  of  Algorithm  2: 

(y^+\t2+1)  =  argming(y)  +  t{(y.t):By-b=t}(y,  t)  +  (2A)_1  ||s^  +t  +  Az|  |||.  (6) 

If  one  minimizes  over  y  first  while  keeping  t  as  a  variable,  one  eliminates  y  and  recovers  the  t2+1-subproblem 
of  Algorithm  2.  If  one  minimizes  over  t  first  while  keeping  y  as  a  variable,  then  after  plugging  in  (3b)  and 
(3c),  problem  (6)  reduces  to  the  y^+1-subproblem  of  Algorithm  1.  In  addition,  (y*+1,t2  X)  obeys 


t^+1  =  By*+1  -  b,  (7) 

which  is  (3a)  at  k  +  1.  Plugging  t  =  t(,+1  into  (6)  yields  problem  (4a)  for  yf+1,  which  must  be  equivalent 
to  the  y^+1-subproblem  of  Algorithm  2.  Therefore,  the  y^’+1-subproblem  of  Algorithm  1  and  the  t2+1- 
subproblem  of  Algorithm  2  are  equivalent  through  (3a)  and  (4a)  at  k  +  1,  respectively. 

Similarly,  under  (7)  and  (3c),  we  can  show  that  the  x^+1-subproblem  of  Algorithm  1  and  the  S2+1- 
subproblem  of  Algorithm  2  are  equivalent  through  the  formulas  for  (3b)  and  (4b)  at  k  +  1,  respectively. 

Finally,  under  (3a)  and  (3b)  at  k  +  1  and  z2  =  zk,  the  formulas  for  zk+1  and  z^+1  in  Algorithms  1  and 
2  are  identical,  and  they  return  z\+1  =  zk+1,  which  is  (3c)  and  (4c)  at  k  +  1. 

Part  2.  Proof  by  induction.  Suppose  that  (5)  holds.  We  shall  show  that  (5)  holds  at  k  +  1.  Starting 
from  the  optimality  condition  of  the  t2+1-subproblem  of  Algorithm  2,  we  derive 


0  G  dG(tk+1)  +  A_1(s§  + 1^+1  +  Az*) 
t*+1  G  9G*(— A_1(s2  +  tk+1  +  Xzk2)) 

A  [A_1(s2  +  t^+1  +  Az§)]  -  (Xz2  +  s2)  G  3G*(— A_1(s2  + 1^+1  +  Xzk)) 

-  A  [A_1(s2  +  tk2+1  +  Azk)\  +  (A113  +  zk)  G  — 9G*(— A_1(s2  +  tk+1  +  Xz2)) 

0  G  — <9G*(— A_1(s§  +  t^+1  +  Xzk))  -  A  [u£  -  A_1(s2  +  tk+1  +  Xzk)  +  A_1zg] 
v3fc+1  =  A_1(S2  +  tk+1  +  Xzk)  =  A_1(Zg  + t*+1  +  Xz2), 


where  the  last  equivalence  follows  from  the  optimality  condition  for  the  Vg+1-subproblem  of  Algorithm  3. 
Starting  from  the  optimality  condition  of  the  S2+1-subproblem  of  Algorithm  2,  and  applying  the  update, 
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„k+ 1 


=  4  +  A  1(s2+1  + t2+1),  in  Algorithm  2  and  the  identity  of  4+1  obtained  above,  we  derive 


0  £  dF(4+i)  +  A_1(s2+1  +  1  +  Az“) 


fc+i 


0  £  dF( s2  )  +  z2 
0£s^’+1-aF*(-z^+1) 

/c  +  1  -i-fc  +  1 


,fc+i\ 


<=>  0  £  A(z2  1  -  4)  -  4+1  -  <9F*(-z 

<=►  0  £  a(z2+1  -  4)  +  4  +  \{4  -  4+1)  -  of* (~4+1) 

<<=»  0  £  -dA*(-z(;'+1)  +  A(z2+1  -  v£+1  +  A^zf) 

_k+i  _  ,.fc+i 

where  the  last  equivalence  follows  from  the  optimality  condition  for  the  Ug+1 -subproblem  of  Algorithm  3. 
Finally,  combining  the  update  formulas  of  4+1  and  4+1  in  Algorithm  2  and  3,  respectively,  as  well  as  the 
identities  for  Ug+1  and  Vg+1  obtained  above,  we  obtain 


=  4  +  A(Ug+1  -  v£+1)  =  sfe  +  A(z^+1  -4-  A_1(s2  + 

=  A(z^+1-Z2fc)-t2fc+1=s^ 


L)) 


,fe+l 

32  ' 


□ 


Remark  2.  Following  Part  1  of  the  theorem,  we  can  view  problem  (P2)  as  the  master  problem  of  (PI), 
whereas  the  two  subproblems  in  (1)  are  independent.  We  can  say  that  ADM  is  essentially  an  algorithm 
applied  only  to  the  master  problem  (P2),  which  is  Algorithm  2;  this  fact  has  been  obscured  by  the  often-seen 
Algorithm  1,  which  integrates  ADM  on  the  master  problem  with  the  independent  subproblems. 

Part  2  of  the  theorem  shows  that  ADM  is  a  symmetric  primal-dual  algorithm.  The  reciprocal  positions 
of  parameter  A  indicates  its  function  to  “ balance  ”  the  primal  and  dual  progresses. 

Remark  3.  ADM’s  primal-dual  equivalence  can  also  be  derived  by  combining  the  following  two  equivalence 
results:  (i)  the  equivalence  between  ADM  on  the  primal  problem  and  the  Douglas- Rachford  splitting  (DRS) 
algorithm  [7,  16]  on  the  dual  problem  [12],  and  (ii)  the  equivalence  result  between  DRS  algorithms  applied  to 
the  master  problem  (P2)  and  its  dual  problem  (cf.  [8,  Chapter  3.5][9]).  In  this  paper,  however,  we  provide 
an  elementary  algebraic  proof  in  order  to  derive  the  formulas  in  theorem  1  that  recover  the  iterates  of  one 
algorithm  from  another. 

Next  we  give  two  concrete  examples  that  illustrate  the  equivalence. 


2.1  Example:  basis  pursuit 

The  basis  pursuit  problem  seeks  for  the  minimal  I\  solution  to  a  set  of  linear  equations: 

minimize  || u| 1 1  subject  to  Au  =  b.  (8) 

U 

Its  Lagrange  dual  problem  is 

minimize—  bTx  subject  to  ||A*x||00  <  1.  (9) 

X 

The  YALL1  algorithms  [20]  implement  ADMs  on  a  set  of  primal  and  dual  formulations  for  basis  pursuit  and 
LASSO,  yet  ADM  for  (8)  is  not  given  (however,  a  linearized  ADM  is  given  for  (8)).  Although  seemingly 
awkward,  problem  (8)  can  be  turned  equivalently  into  the  ADM-ready  form 

minimize  || v||i  +  i/u.Au=b>(u)  subject  to  u  —  v  =  0.  (10) 

U,V  1  J 
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Similarly,  problem  (9)  can  be  turned  equivalently  into  the  ADM-ready  form 


minimize  — b7  x  +  iB°v  (y)  subject  to  A*x  —  y  =  0.  (11) 

x,y  1 

For  simplicity,  let  us  suppose  that  A  has  full  row  rank  so  the  inverse  of  AA*  exists.  (Otherwise,  Au  =  b 
are  redundant  whenever  they  are  consistent;  and  (AA*)-1  shall  be  replaced  by  the  pseudo-inverse  below.) 
ADM  for  problem  (10)  can  be  simplified  to  the  iteration: 


v3+1  =  argmin||v||1  +  ^||u(j-v  +  *z£||^,  (12a) 

u3fc+1  =v3fc+1  -  ^z3  -  A*(AA*)-1(A(v3fc+1  -  ±z*)  -  b),  (12b) 

Z3  +  1  =Z3  +  A(U3  +  1  —  V3  +  1).  (12c) 

ADM  for  problem  (11)  can  be  simplified  to  the  iteration: 


y^1  =VB~(A*x.ki  +  \z  *),  (13a) 

x?+1  =  (AA*)-1(Ay*+1  -  A(AzJ  -  b)),  (13b) 

z*+1  =zf  +  A-1(A*x^+1  -  y*+1).  (13c) 


The  corollary  below  follows  directly  from  Theorem  1  by  associating  (11)  and  (10)  as  (PI)  and  (D2),  and 
(13)  and  (12)  with  the  iterations  of  Algorithms  1  and  3,  respectively. 

Corollary  1.  Suppose  that  Au  =  b  are  consistent.  Consider  ADM  iterations  (12)  and  (13).  Let  ujj  =  z\ 
and  z3  =  A*x3.  Then,  for  k  >  1,  iterations  (12)  and  (13)  are  equivalent.  In  particular, 

•  From  x3,  zf  in  (13),  we  obtain  u3,  z3  in  (12)  through: 

u3  =  zf ,  zl  =  A*x(\ 


•  From .  u3,  z3  in  (12),  we  obtain  x3,  z3  in  (13)  through: 

xf  =  (AA*)-1Azg,  zf  =  u*. 


2.2  Example:  basis  pursuit  denoising 

The  basis  pursuit  denoising  problem  is 

minimize  |  u  1 1 i  H - |  Au  —  b  1 1  \ 

u  2a 

and  its  Lagrange  dual  problem,  in  the  ADM-ready  form,  is 

minimize  — (b,  x)  +  —  ||x|||  +  (y)  subject  to  A*x  —  y  =  0. 

x.y  2 

The  iteration  of  ADM  for  (15)  is 

y^1  =Pfloo(A*xf +  Azf), 

xj+1  =(AA*  +  aAI)“1(Ayjt+1  -  A(Az*  -  b)), 

z\+1  =zf  +  A-1(A*x^+1  -  y*+1). 


(14) 


(15) 
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(16a) 

(16b) 

(16c) 


The  ADM-ready  form  of  the  original  problem  (14)  is 


minimize  ||v||i  +  —  |  Au  —  b|| |  subject  to  u  -  v  =  0, 
u,v  2a 


whose  ADM  iteration  is 


k+ 1 


,fc+ 1 


A, 


1 


=  argmm||v||i  +  —  ||u3  -  v  +  -z 


k  1 1 2 

2""3-vt  A^lla 


=(A*A  +  aAI)_1(A*b  +  aAv^+1  -  az\) 

L) 


z,3  =Zg  +  A(Ug  —  V3 
The  corollary  below  follows  directly  from  Theorem  1. 

Corollary  2.  Consider  ADM  iterations  (16)  and  (18).  Let  u3  =  z°  and  z3  =  A*x° 
the  dual  and  primal  problems  (16)  and  (18)  are  equivalent  in  the  following  way: 

•  From  x3,  z\  in  (16),  we  recover  u3,  z|  in  (18)  through: 


(17) 

(18a) 

(18b) 

(18c) 


_  A  *vk 

*•3  ~  A  X1  • 


•  From  u3,  z3 


in  (18),  we  recover  x3,  z\  in  (16)  through: 

xi  =  ~(Au3  -  b)/a. 


zn  =  u 


3' 


Remark  4.  Iteration  (18)  is  different  from  that  of  ADM  for  another  ADM-ready  form  of  (14) 

minimize  llulli  +  — 1| v|||  subject  to  Au  —  v  =  b,  (19) 

u,v  2a 

which  is  used  in  [20].  In  general,  there  are  different  ADM-ready  forms  and  their  ADM  algorithms  yield 
different  iterates.  ADM  on  one  ADM-ready  form  is  equivalent  to  it  on  the  corresponding  dual  ADM-ready 
form. 

3  ADM  as  a  primal-dual  algorithm  on  a  saddle-point  problem 

As  shown  in  section  2,  ADM  on  a  pair  of  convex  primal  and  dual  problems  are  equivalent,  and  there  is  a 
connection  between  z('  in  Algorithm  1  and  dual  variable  u3  in  Algorithm  3.  This  primal-dual  equivalence 
naturally  suggests  that  ADM  is  also  equivalent  to  a  primal-dual  algorithm  involving  both  primal  and  dual 
variables. 

We  derive  problem  (PI)  into  an  equivalent  primal-dual  saddle-point  problem  (21)  as  follows: 


min^(y)  +  /(x)  +  i{(x,y):Ax=b-By}(x,  y) 

ming(y)  +  F(b  —  By) 
y 

min  max  g(y)  +  (— u,  b  —  By)  —  F*  (— u) 

y  u 

(20) 

min  max  g(y)  +  (u,  By  b)  -  /*(- A*u). 

(21) 

y  « 


A  primal-dual  algorithm  for  solving  (21)  is  described  in  Algorithm  4.  Theorem  2  establishes  the  equivalence 
between  Algorithms  1  and  4. 


Algorithm  4  Primal-dual  formulation  of  ADM  on  Problem  (21) 
initialize  u°,  %  ,  yf,  A  >  0 
for  k  =  0, 1,  •  •  •  do 

uf  =  2u|  -  Ug  1 

y5fc+1  =  arg min g(y )  +  (2A)-1||By  -  Byf  +  Au|||| 

u.5+1  =  arg  min/*(— A*u)  -  (u,Byf+1  -  b)  +  A/2||u  -  u|||| 

U 

end  for 


Remark  5.  Paper  [4]  proposed  a  primal-dual  algorithm  for  (20)  and  obtained  the  connection  between  ADM 
and  that  primal-dual  algorithm  [10]:  When  B  =  I,  ADM  is  equivalent  to  the  primal-dual  algorithm  in  [4]; 
When  B  ^  I,  the  primal-dual  algorithm  is  a  preconditioned  ADM  as  an  additional  proximal  term  <5/2||y  — 
y5fc|||-(2A)-1||By-Byf  |||  is  added  to  the  subproblem  for  y^1 .  This  is  also  a  special  case  of  inexact  ADM 
in  [6[.  Our  Algorithm  4  is  a  primal-dual  algorithm  that  is  equivalent  to  ADM  in  the  general  case. 

Theorem  2  (Equivalence  between  Algorithms  1  and  4).  Suppose  that  Ax?  =  A(ui(  —  u^1)  +  b-Byg  and 
zj  =  Ug.  Then,  Algorithms  1  and  4  are  equivalent  with  the  identities: 

Axf  =  A(Ug  —  uf_1)  +  b  —  Byf,  zf  =  ug,  (22) 

for  all  k  >  0. 

Proof.  By  assumption,  (22)  holds  at  iteration  k  =  0. 

Proof  by  induction.  Suppose  that  (22)  holds  at  iteration  k  >  0.  We  shall  establish  (22)  at  iteration  k  +  1. 
From  the  first  step  of  Algorithm  1,  we  have 

y*+1  =  arg  min  <7 (y)  +  (2A)-1||AxJ  +  By  -  b  +  Az*||| 
y 

=  arg  min 3 (y)  +  (2A)_1||  A(uf  -  uf_1)  +  By  -  Byf  +  Au||||, 
y 

which  is  the  same  as  the  first  step  in  Algorithm  4.  Thus  we  have  y f+1  =  yf+1. 

Combing  the  second  and  third  steps  of  Algorithm  1,  we  have 

0  G  <9/(xf+1)  +  A-1  A*  (Axj+1  +  Byf+1  -  b  +  Azf)  =  <9/(xf+1)  +  A*zf+1. 

Therefore, 


xf+1  G 

<9/*(-A*zf+1) 

Axf+1 

G  dF*(- zf+1) 

A(zf+1 

-  z{)  +  b  -  Byf+1 

G  <9F*(-zf+1 

) 

7k+ 1  _ 
Z1  — 

argminF*(— z)  —  ( 

Z 

z,  Byf+1  —  b) 

+ 

A/2  |z  —  zf  ||| 

7k+l  _ 
Z1  — 

argmin/*(— A*z)  - 

-(z,Byf+1- 

b) 

+  A/2  z  —  uf  || 

Z 


where  the  last  line  is  the  second  step  of  Algorithm  4.  Therefore,  we  have  zf+1  =  uf+1  and  Ax{+1  = 
A(zf+1  -  z{)  +  b  -  Byf+1  =  A(Ug+1  -  uf )  +  b  -  By*+1.  □ 
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4  Equivalence  of  ADM  for  different  orders 


In  both  problem  (PI)  and  Algorithm  1,  we  can  swap  x  and  y  and  obtain  Algorithm  5  below,  which  is  still  an 
algorithm  of  ADM.  In  general,  the  two  algorithms  are  different.  In  this  section,  we  show  that  for  a  certain 
type  of  function  /  (or  g),  Algorithms  1  and  5  become  equivalent. 


Algorithm  5  ADM2  on  (PI) 
initialize  y°,  z°,  A  >  0 
for  k  =  0, 1,  •  •  •  do 

x£+1  =  arg  min  / (x)  +  (2A)-1||Ax  +  By*  -  b  +  AzJ  ||| 
y^+1  =  arg  min 5 (y)  +  (2A)"1 1|  Ax^1  +  By  -  b  +  Az^||| 

zl+1  =  z !  +  A-1(Ax4+1  +  By*+1  -  b) 

end  for 


The  assumption  that  we  need  is  that  either  proxF ^  or  proxG(^  is  affine  (cf.  (1)  for  the  definitions  of  F 
and  G). 

Definition  1.  A  mapping  T  is  affine  if,  for  any  ri  and  r2, 


1 


1 


1. 


TUri+2r2J  =  2Tri  +  2Tr2' 

Proposition  1.  Let  A  >  0.  The  following  statements  are  equivalent: 

1.  proxG(.j  is  affine; 

2.  proxAG(.}  is  affine; 

3.  aproxG(.j  o  61  +  cl  is  affine  for  any  scalars  a,  b  and  c; 
f.  proxG«,-.)  is  affine; 

5.  G  is  convex  quadratic  (or,  affine  or  constant)  and  its  domain  dom(G)  is  either  Q  or  the  intersection 
of  hyperplanes  in  Q . 

In  addition,  if  function  g  is  convex  quadratic  and  its  domain  is  the  intersection  of  hyperplanes,  then  function 
G  defined  in  (lb)  satisfies  Part  5  above. 


Proposition  2.  If  proxG(.^  is  affine,  then  the  following  holds  for  any  rj  and  r2: 

proxG(.)(2ri  -  r2)  =  2proxG(.)ri  -  proxG(.}r2. 

Proof.  Equation  (23)  is  obtained  by  defining  r i  =  2ri  —  r2  and  r2  :=  r2  and  rearranging 


(23) 


1_  1_ 

ProxG(.)  |  -vl  +  -r2 


1  _  1 
-proxG(.)ri  +  -proxG(.)r2. 


□ 


Theorem  3  (Equivalence  of  Algorithms  1  and  5). 
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1.  Assume  that  proxAG(.)  is  affine.  Given  the  sequences  y},  z},  andx.%  of  Algorithm  5,  if  y}  and  z4  satisfy 
— Z4  €  9G(By4  — b)  ,  t/ien  we  can  initialize  Algorithm  1  with  x}  =  x}  and  z}  =  Z4+A_1(Ax4+By4  — b), 


anti  recover  the  sequences  x}  and  z}  of  Algorithm  1  through 


—  Yfc  +  1 
-  -A-/I  1 


Z?  =  z}  +  A_1(Ax}+ 1  +  By}  -  b). 


(24a) 

(24b) 


2.  Assume  that  proxAF(.)  is  affine.  Given  the  sequences  x};  z},  and  y}  of  Algorithm  1,  ifx.\  and  z}  satisfy 
— z}  G  dF( Ax}),  then  we  can  initialize  Algorithm  5  with  y}  =  y\  and  z}  =  z}  +  A_1(Ax}  +  By}  —  b), 
and  recover  the  sequences  y}  and  z}  of  Algorithm  5  through 


vk  _  vfc+i 

y4  -  Yi  i 


A_1(Ax}  +  By}+1  —  b). 


(25a) 

(25b) 


Proof.  We  prove  Part  1  only  by  induction.  (The  proof  for  the  other  part  is  similar.)  The  initialization  of 
Algorithm  1  clearly  follows  (24)  at  k  =  0.  Suppose  that  (24)  holds  at  k  >  0.  We  shall  show  that  (24)  holds 
at  k  +  1.  We  first  show  from  the  affine  property  of  proxAG(.): 

By}+1  =  2By}+1  -  By}.  (26) 

The  optimization  subproblems  for  yi  and  y 4  in  Algorithms  1  and  5,  respectively,  are  as  follows: 

rk+ 1 


Yi  1  =  arg ming(y)  +  (2A)-1!!  Ax?  +  By  -  b  +  Az*|& 
y 


y}+1  =  argmin5(y)  +  (2A)  x| 
y 

Following  the  definition  of  G  in  (1),  we  have 


Ax}+1  +  By  —  b  +  Az 


till 


By}+1  -  b  =  proxAG(. 

j(-Axf-Azf), 

(27a) 

By}+1  -  b  =  proxAG(. 

)(~Ax}+1  —  Az}), 

(27b) 

By}  -  b  =  proxAG(. 

,(— Ax}  —  Az}-1). 

(27c) 

z}  =  z}-1  +  A-1  (Ax}  +  By}  -  b). 

(28) 

The  third  step  of  Algorithm  5  is 


(Note  that  for  k  =  0,  the  assumption  — z}  €  <9G(By}  —  b)  ensures  the  existence  of  z4  1  in  (27c)  and  (28).) 


Then,  (24)  and  (28)  give  us 


'k+1  +  Az}  +  Ax)+1  +  By}  -  b 


Axj  +  Az}  ='  Ax')  ^ 

=  2(Ax}+1  +  Az})  -  (Az}  -  By}  +  b) 

(=  2(Ax}+1  +  Az})  -  (Ax}  +  Az}-1). 

Since  proxAG(.j  is  affine,  we  have  (23).  Once  we  plug  in  (23):  ri  =  -Ax}+1  —  Az},  r2  =  —  Ax*  —  Az }_1, 
and  2ri  —  r2  =  —Ax}  —  Az}  and  then  apply  (27),  we  obtain  (26). 

Next,  the  third  step  of  Algorithm  5  and  (26)  give  us 

By}+1  -  b  +  Az}  (=}  2(By}+1  -  b)  -  (By}  -  b)  +  Az}  +  (Ax}+1  +  By}  -  b) 


=  (By}+ 1  -  b)  +  Az}  +  (Ax4 
=  (By}+1  -  b)  +  Az}+1. 


fe+1 


By}+1  -  b) 
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This  identity  shows  that  the  updates  of  x4+1  and  X4+2  in  Algorithms  1  and  5,  respectively,  have  identical 
data,  and  therefore,  we  recover  x4+1  =  x^'-1"2. 

Lastly,  from  the  third  step  of  Algorithm  1  and  the  identities  above,  it  follows  that 

z^+1  =  zf  +  A_1(Axf+1  +  By*+1  -  b) 

=  zf  +  A’1  (Axf2  +  (By*+1  -  b  +  Az4+1  -  Az{)) 

=  Z4fc+1  +  A_1(Ax4+2  +  By*+1  -  b). 

Therefore,  we  obtain  (24)  at  k  +  1.  □ 

Remark  6.  We  can  avoid  the  technical  condition  —  z°  £  dG( By4  —  b)  on  Algorithm  5  in  Theorem  3  Part 
1.  When  it  does  not  hold,  we  can  use  the  always-true  relation  —z\  £  <9G(By4  —  b)  instead;  correspondingly, 
we  shall  add  1  iteration  to  the  iterates  of  Algorithm  5,  namely,  initialize  Algorithm  1  with  x^  =  x|  and 
z\  =  z\  +  A_1(Ax|  +  By4  —  b)  and  recover  the  sequences  x4  and  z \  of  Algorithm  1  through 

xf  =  x4fc+2,  (29a) 

z \  =  z\+1  +  A_1(AxJ+2  +  By*;+1  -  b).  (29b) 

Similar  arguments  apply  to  the  other  part  of  Theorem  3. 


5  Application:  extended  monotropic  programming 


In  this  section,  we  use  an  example  to  demonstrate  that  the  equivalent  algorithms  may  still  have  different 
per-iteration  complexities  and  the  primal-dual  algorithm,  Algorithm  4,  may  be  preferable  over  the  others. 
The  following  extended  monotropic  program  [2]  arises  in  the  setting  of  parallel  and  distributed  computation: 

N  N 

minimize  V  /*(xj)  subject  to  7  AjX,  =  b,  (30) 

xi,x2,-,xn  ' ' 

i— 1  i—1 


where  x,;  £  Rre* ,  A*  £  Rmxni,  and  b  £  Rm,  for  i  =  1, . . . ,  N.  To  apply  ADM,  one  can  convert  the  problem 
into  the  following  ADM- ready  formulation  by  introducing  variables/constraints  y,  =  Ax;: 


minimize 

{xddy,} 

subject  to 


N 

J2  /i(xi)  +  i{y.ENiy.=b}(y) 
i—1 

A jXj  -  y,  =  0. 


(31) 


Problem  (31)  is  in  the  form  of  (PI)  with  x  :=  (xi,  X2,  •  •  •  ,  xjv)  and  y  :=  (yi,  y2,  ■  ■  •  ,Yn)-  Therefore,  ADM 
algorithms  can  be  applied.  In  particular,  Algorithm  1  has  the  following  updates  for  every  i  at  iteration  k 
(cf.  [5]): 


yf+1  =£+  +  Azf  -  1  jx;  AjX*  +  Az* |  , 

(32a) 

xf+1  =argmin/i(xj)  +  (2A)_1  A,x.j  -  yf+1  +  Azf  ^  , 

Xi 

(32b) 

zi+1  =zi  +  A_1(A.jX,^+1  —  y^+1). 

(32c) 

Once  Axj  +  ^zj  computed,  the  above  three  steps  rely  on  data  of  subscript  i  only,  so  they  can  be 

carried  out  for  i  =  1, . . . ,  N  in  parallel. 
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Algorithm  4  for  problem  (31)  has  the  following  updates  for  every  i  at  iteration  k: 


u?  =2u?  -  u 


k- 1 


fc+r  =  Y  ,  fc 

•7  2  jy  '  Jl 


N 


Au'"Af|Ey)+Aujf! 

.  j=! 


U 


fc+1  _ 


=  arg  min  f*  (-A* u*)  +  (A/2)  u,  -  uf  +  A  1y. 


1 2  ' 


(33a) 

(33b) 

(33c) 


Likewise,  the  above  three  steps  can  be  carried  out  for  i  =  1, . . . ,  N  in  parallel  except  that  computing 
J2jLi  +  Az j  requires  data  of  subscripts  j  =  1, . . . ,  N.  In  the  distributed  setting,  the  summation  term 

requires  communication. 

The  following  corollary,  which  is  a  direct  result  of  Theorem  2,  establishes  the  equivalence  between  itera¬ 
tions  (32)  and  (33). 

Corollary  3.  Suppose  that  A^x/  =  A(u°  —  u“  )  +y°  and  z/  =  u/  for  i  —  1,  2,  •  •  •  ,  N.  Then ,  iterations  (32) 
and  (33)  are  equivalent  with  the  following  identities  for  all  k  >  0: 


=  A(u/-u/-1)+y* 


z?  =  iT. 


When  (33c)  is  easy  to  solve,  the  complexity  of  algorithm  (33)  is  smaller  than  that  of  algorithm  (32). 


5.1  An  example  with  different  complexities 

Problem.  In  problem  (30),  let  /,(•)  =  (1/2) ||  •  |||  and  A;A*  =  I  for  i  =  1,  2,  •  •  •  ,  N.  The  problem  becomes 
finding  the  solution  x  satisfying  the  constraint  y/f/i  A jX,;  =  b  with  the  minimal  norm. 

Primal-dual  iteration:  Since  subproblem  (33c)  can  be  solved  analytically,  iteration  (33)  simplifies  to: 


Of  =2u?  -  u 

fc+i  ,  k 

j  i  N  J 1 


k- 1 


Auf  - 


N 


-  yj  +  AQj 


=(Au/-yf+1)/(A+l). 


(34a) 

(34b) 

(34c) 


The  complexity  of  (34)  for  each  i  and  k  is  10?n  flops,  plus  the  communication  cost  if  the  summation  is  taken 
in  a  distributed  setting.  To  see  this:  (34a)  has  2m  flops;  (34b)  has  2m  +  3m  flops,  ignoring  the  summation 
over  j ;  (34c)  has  3m  flops.  In  addition,  upon  termination,  we  need  an  additional  step  to  obtain  x^  by  solving 

minimize  || x.j || 2  subject  to  A;Xi  =  yi: 

X-i 


which  can  be  done  analytically  by  X;  =  A*y i  for  rrint  flops  for  each  i. 

ADM  iteration:  Since  subproblem  (32b)  can  be  solved  analytically,  iteration  (32)  reduces  to: 


yf+1  = 


N 


A.x; 


Az/  - 


k+ 1 


=(AI  +  A*A,)”iA*  (y/+  —  Az/), 
=zi  +  A_1(Ajx/+1  —  y/+1). 


(35a) 

(35b) 

(35c) 


We  can  store  (AI+A*  A,)  1  A*  for  each  i.  Note  that  in  the  distributed  setting,  the  summation  in  (35a)  has  the 
same  cost  as  that  in  (34b).  Excluding  the  summation,  the  complexity  of  (35)  for  each  i  and  k  is  2 mn,  +  10m 
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flops,  which  is  calculated  as  follows:  (35a)  has  mrii  +  2m  +  3m  flops;  (35b)  has  mn,;  +  2m  flops;  (35c)  has  3m 
flops  to  find  z i  in  (35c).  In  addition,  it  needs  a  preprocessing  step  to  obtain  (AI+ A*Aj)_1A*  =  A*(AI+I)_1 
for  mnt  flops  for  each  i. 

To  summarize,  we  can  compare  (34)  and  (35)  as  follows: 

•  For  each  i  and  k,  (34)  has  10?n  flops  comparing  to  the  2mn.j  +  10m  flops  of  (35).  Since  (34)  does  not 
explicitly  update  x, ,  it  saves  2 mrii  flops.  Such  saving  is  large  and  important  when  m’ s  are  large. 

•  In  addition,  (34)  has  a  post-step  and  (35)  has  a  pre-step,  both  of  which  have  mrii  flops  for  each  i. 

•  In  the  distributed  setting,  while  the  communication  cost  is  the  same  for  both  algorithms,  (34)  is  still 
preferred  over  (35)  since  the  10m  flops  of  (34)  does  not  depend  on  i  and  thus  is  good  for  load  balancing. 


6  Application:  Total  variation  image  denoising 


ADM  (or  Split  Bregman  [15])  has  been  applied  to  many  image  processing  applications,  and  we  apply  the 
previous  equivalence  results  of  ADM  to  derive  several  equivalent  algorithms  for  the  total  variation  image 
denoising. 

The  total  variation  (ROF  model  [17])  applied  on  image  denoising  is 

minimize  f  \Dx\  +  ^\\x  -  bg 
xeBV(Q)  2" 

where  x  stands  for  an  image  and  BV(Q)  is  the  set  of  all  bounded  variation  functions  on  Cl.  The  first  term 
is  known  as  the  total  variation  of  x ,  minimizing  which  tends  to  yield  a  piece-wise  constant  solution.  The 
discrete  version  is  as  follows: 


minimize  j|Vxj|2  i  +  —  ||x  —  b||2. 

x  2 

Without  loss  of  generality,  we  consider  the  two-dimensional  image  x,  and  the  discrete  total  variation  ||  Vx||2.i 
of  image  x  is  defined  as 

||Vx||2)1=£;|(Vx)y|, 

ij 

where  |  •  |  is  the  2-norm  of  vector  (Vx)y.  The  equivalent  ADM-ready  form  [15,  Equation  (3.1)]  is 

minimize  ||y ||2  i  +  —  ||x  —  b|| |  subject  to  y  —  Vx  =  0,  (36) 

and  its  problem  in  the  ADM-ready  form  [3,  Equation  (8)]  is 

minimize  —  ||div  u  +  ab|||  +  t/v-||v||2  <if(v)  subject  to  u  —  v  =  0,  (37) 

v.u  2a  1  ' 

where  ||v||2)00  =  max  |(v)y|. 

In  addition,  the  equivalent  saddle-point  problem  is 

nun  max  ^||x  -  b|||  +  (v,  Vx)  -  t{v:||v||2,oc,<1}(v).  (38) 

We  list  the  following  equivalent  algorithms  for  solving  the  total  variation  image  denoising  problem.  The 
equivalence  result  stated  in  Corollary  4  can  be  obtained  from  theorems  1-3. 
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1.  Algorithm  1  (primal  ADM)  on  (36)  is 


Xi'+1  =argmink||x  —  b|||  +  (2A)  ^Vx-yf  +  AzJ|||, 

(39a) 

yki+1  =  argmin  ||y||2)i  +  (2A)_1||  VxJ+1  -  y  +  Azf  ||1, 
y 

(39b) 

zk+1  =z{  +  A”1(Vxg+1  -  y^+1). 

(39c) 

2.  Algorithm  3  (dual  ADM)  on  (37)  is 

Ua+1  =argmin-!-||div  u  +  ab|||  +  ^||v£  -  u  +  A_1Z2 1||, 

u  2jOl  2 

V2+1  =argmin<,{v:||v||2  oo<1}(v)  +  ^||v  -  uk+1  +  A_1zg|||, 
z2  +  1  =z2  +  ^(v2  +  1  —  u2  +  1). 

3.  Algorithm  4  (primal-dual)  on  (38)  is 

=  2Vg  -  V^'-1 

*3+1  =  argmin  ^||x  -  b|||  +  (2A)_1||Vx  -  Vx£  +  Av§|||, 

Va+1  =argmint{v:|Mkoo<1}(v)  -  (v,Vx^+1)  +  ^||v  —  vfc||| 

4.  Algorithm  5  (primal  ADM  with  order  swapped)  on  (36)  is 

y£+1  =  argmin  ||y||2,i  +  (2A)_1||  Vx((  -  y  +  AzJ|| (42a) 

y 

X4+1  =argmink||x-  b|||  +  (2A)-1||Vx  -  yk+1  +  Xz^Wl,  (42b) 

zk+1  =z\  +  A-1(Vx4+1  -  y*+1).  (42c) 

Corollary  4.  Let  x°  =  b  +  a_1div  z°.  If  the  initialization  for  all  algorithms  (39)-(42)  satisfy  y3  =  — z°  = 
Vx°  —  A(v°  —  Vg  x)  =  y\  and  z]  =  v°  =  v3  =  z4  +  A_1(Vx°  —  y]).  Then  for  k  >  1,  we  have  the  following 
equivalence  between  the  iterations  of  the  four  algorithms: 

Vi  =~4  =  Vx3fc  -  A(v3fc  -  v*"1)  =  y4fe+1, 

z\  =vk  =vk  =  zk  +  A_1(Vx4  —  y4+1). 

Remark  7.  In  any  of  the  four  algorithms,  the  V  or  div  operator  is  separated  in  a  different  subproblem  from 
the  term  ||  •  ||2,i  or  its  dual  norm  ||  •  ||2,oo-  The  V  or  div  operator  is  translation  invariant  so  their  subproblems 
can  be  solved  by  a  diagonalization  trick  [18].  The  subproblems  involving  the  term  ||  •  || 2,1  or  the  indicator 
function  £{v:||v||2  ^<1}  have  closed-form  solutions.  Therefore,  in  addition  to  the  equivalence  results,  all  the 
four  algorithms  have  essentially  the  same  per-iteration  costs. 


(41a) 

(41b) 

(41c) 


(40a) 

(40b) 

(40c) 
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